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Abstract 

We consider linear inverse problems in a nonparametric statis- 
tical framework. Both the signal and the operator are unknown 
and subject to error measurements. We establish minimax rates of 
convergence under squared error loss when the operator admits a 
blockwise singular value decomposition (blockwise SVD) and the 
smoothness of the signal is measured in a Sobolev sense. We con- 
struct a nonlinear procedure adapting simultaneously to the un- 
known smoothness of both the signal and the operator and achieving 
the optimal rate of convergence to within logarithmic terms. When 
the noise level in the operator is dominant, by taking full advan- 
tage of the blockwise SVD property, we demonstrate that the block 
SVD procedure overperforms classical methods based on Galerkin 
projection [14] or nonlinear wavelet thresholding [18]. We subse- 
quently apply our abstract framework to the specific case of blind 
deconvolution on the torus and on the sphere. 
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1 Introduction 



1.1 Motivation 

Consider the following idealised statistical problem: estimate a function 
/ (a signal, an image) from data 

y n = Kf + rT 1 ' 2 W, (1.1) 

where 

K : H G 

is a linear operator between two Hilbert spaces H and G. The observation 
of the unknown / 6 H is challenged by the action of the linear degradation 
K as well as contaminated by an experimental Gaussian white noise W on 
G with vanishing noise level n -1 / 2 as n — > oo. Alternatively, in a density 
estimation setting, we observe a random sample (Z\, . . . , Z n ) drawn from 
a probability distribution 1 with density Kf. In each case, we do not know 
the operator K exactly, but we have access to 

K S = K + 5B, (1.2) 

where B is a Gaussian white noise on EI x G thanks to preliminary ex- 
periments or calibration through trial functions. This setting has been 
discussed in details in [14, 18]. In this paper, we are interested in oper- 
ators K admitting a singular value decomposition (SVD) or a blockwise 
SVD. In essence, we know the typical eigenfunctions of K but not the 
eigenvalues. We cover two specific examples of interest: spherical and cir- 
cular deconvolution. 

Spherical deconvolution. Used for the analysis of data distributed on the 
celestial sphere, see Section 4.1 below. One observes a random sample 
(Zi,...,Z n ) with 

Zi = £iXi, i = 1, . . . , n 

where the £{ are random elements in SO (3), the group of 3 x 3 rotation 
matrices, and the Xi are independent and identically distributed on the 
sphere S 2 , with common density / with respect to the uniform probability 
distribution on § 2 . In this setting, if the £i have common density g with 
respect to the Haar measure du on SO(3), we have 

Kf{x)=g*f{x)= [ g(u)f{u~ l x)du, x £ S 2 . 
JSO{3) 

1 In that setting, Kf must therefore also be a probability density . 
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We are interested in the case where the exact form g is unknown. However, 
K is block-diagonal in the spherical harmonic basis. □ 

Circular deconvolution. Used for restoring signal or images, see Section 
4.2 below. We take HI = G = L 2 (T) the space of square integrable func- 
tions on the torus T = [0, 1] (or [0, l] d ) appended with periodic boundary 
conditions. We have 

Kf{x) = g* f(x) = / g(u)f(x - u)du, x£T. 

JT 

The degradation process K = g*» is characterised by the impulse response 
function g which we do not know exactly. However, K is diagonal in the 
Fourier basis. □ 

Although the problem of estimating / is fairly classical and well under- 
stood when K is known (a selected literature is [32, 8, 12, 1, 17, 29, 28] 
and the references therein), only moderate attention has been paid in 
the case of an unknown K despite its relevance in practice. When the 
eigenfunctions of K are known solely, we have the results of Cavalier and 
Hengartner [5], Cavalier and Raimondo [6] but they are confined to the 
case where the error in the operator is negligible 5 <C re -1 / 2 . In a gen- 
eral setting with error in the operator, Efromovitch and Kolchinskii [14] 
and later Hoffmann and ReiB [18] studied the recovery of / when the 
eigenfunctions and eigenvalues of K are unknown. In both contributions, 
a marginal attention is paid to the case of sparse or diagonal operators, 
but it is showed in both papers that unusual rates of convergence can 
be obtained when n -1 / 2 <C 5. In a univariate setting, Neumann [24] and 
Comte and Lacour [9] consider the case of deconvolution with an error 
density, known only through an auxiliary set of m learning data. This 
formally corresponds to having 5 = m. -1 / 2 in our setting. Minimax rates 
and adaptive estimators are derived in both regimes m <C n and n <C m. 
We address in the paper the following program: 

i) Construction of a feasible procedure f n g estimating / from data 
(1.1) and (1.2) that achieves optimal rates of convergence (up to 
inessential logarithmic terms). We require f n ^g to be adaptive with 
respect to smoothness constraints on / and K. 

ii) Identification of best achievable accuracy for / under smoothness 
constraints on / and K so that the interplay between ra -1 / 2 and 5 
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can be explicitly related in the asymptotic 5 — > and n — > oo; this 
includes the comparison with earlier results of [24, 14, 18] in the 
context of blockwise SVD. 

iii) Application to spherical deconvolution on S 2 or circular deconvo- 
lution on the torus; this includes the discussion of our findings in 
terms of the existing literature on the topic [7, 25, 22] and some 
practical aspects of numerical implementation. 

1.2 Main results and organisation of the paper 

In Section 2, we present an abstract framework that allows for operators 
K to admit a so-called blockwise SVD. This property is simply turned 
into the existence of pairs of increasing finite dimensional spaces (Hi, Gi) 
that are stable under the action of K . The blockwise SVD property is fur- 
ther appended with a smoothness condition quantified by the arithmetic 
decay of the operator norm of K and its inverse on Hi (resp. Ge) (the 
so-called ordinary smooth assumption, see e.g. [32]). By means of a re- 
construction formula, we obtain in Section 2.2 an estimator f rii s of / by 
first inverting Kg on Hi with a thresholding tuned with 5 and then filter 
the resulting signal by a block thresholding tuned with n" 1 / 2 . As for i) 
and ii), we establish in Theorems 3.1 and 3.4 of Section 3 the minimax 
rates of convergence for Sobolev constraints on / under squared error loss 
and we demonstrate that f n ^ is optimal and adaptive to within logarith- 
mic terms. The explicit interplay between 5 and n" 1 / 2 is revealed and 
discussed in the case of sparse operator when n -1 / 2 <C 5, completing ear- 
lier findings in [14, 18] and to some extent [24] in the univariate case for 
density deconvolution. In particular, we demonstrate that a certain para- 
metric regime dominates when the smoothness of the signal dominates 
the smoothing properties of the operator. Concerning iii), the method 
is applied to the case of spherical and circular deconvolution in Section 
4 where harmonic Fourier analysis enables to provide explicit blockwise 
SVD for the convolution operator. We illustrate the numerical feasability 
of f n ,s and the phenomena that appear in the case n -1 / 2 <C 5. Section 5 
is devoted to the proofs. 

We choose to state and prove our results in the white Gaussian model 
generated by the observation of y n and K$ defined by (1.1) and (1.2). 
The extension to the case of density estimation, when y n is replaced by 
the observation of a random sample of size n drawn from the distribution 
Kf, like for instance in [24, 9] is a bit more involved, due to the intrinsic 
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heteroscedasticity that appears when enforcing a formal analogy with the 
Gaussian setting (1.1). It is briefly addressed in the discussion Section 3.2 

2 Estimation by blockwise SVD 

2.1 The blockwise SVD property 

Let Q denote a family of linear operators 

K : M ->■ G 

between two Hilbert spaces HI and G that shall represent our parameter 
set of unknown K. 

A fundamental property (Assumption 2.1 below) is that an explicit 
singular value decomposition (SVD) or blockwise SVD is known for all 
K E Q simultaneously. More specifically, we suppose that there exist two 
explicitly known bases (e\, A € A) of EI and (g\, A € A) of G, as well 
as a partition of A = U£>iA^ with A^ n kg = if i ^ £' , and a constant 
d > 1 such that: 

< |A*| <e d ~\ 

where < means inequality up to a multiplicative constant that does not 
depend on I. Here |A^| = Card(A^). 

It is worthwhile to notice that in our examples as well as in the rates 
of convergence that we will exhibit later, d plays the role of a dimension. 
In particular, d = 1 corresponds to a 'standard SVD', whereas d > 1 
creates blocks and deserves the name of 'blockwise' SVD. However, there 
is no need in the paper to assume that d is in N. Set 

Hi = Spanje^, A E A^} and Gi = Span{^> A G A^}. 

The Galerkin projection of an operator T : H — > G onto (Hj>, Gi) is defined 
by Ti = P(T\n t , where Pi is the orthogonal projector onto Gi. 

Assumption 2.1 (Blockwise SVD). 

K\ Ht = Ki for every K G Q,£ > 1. 

We further need to quantify the action of K on Hi. We denote by 
WTiWhi-.Gi = snp veH( ,,\\v\\ m =i \\ t £ v \\g the operator norm of T e . 
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Assumption 2.2 (Spectral behaviour of K\jj e ). For every £ > 1, Kg is 
invertible and there exists v > such that 

Qi(K) = svpr v \\{K t )- l \\ G ^ Ht < oo 
e>i 

and 

Q 2 (K) = sup£ u \\K e \\ He ^G e < oo 

for every K 6 Q. 

We associate with the bases (ex, A £ A) and (gx, A € A) the following 
decompositions 

/ = ^ ^ (/> e A> <5A, 5 = ^5^ ^' 5A fol ' 6Very f G M ' 9 G G ' 

£>1 AgA^ £>1 AgA^ 

where (•, •) denotes the inner product either in HI or G and the scale of 
Sobolev spaces 

W s = {/el, ll/llw* = E £2s E^> 2<00 }' sGM ' ( 2J ) 

£>i agA^ 

W s = {g€G, IM|^ s =J> 2s ^(<7,<7a) 2 <oo}, s£R. 

£>1 AgA £ 

For i/ > 0, Assumption 2.2 implies that K : Y\)~ v l 2 — y )aW 2 is continuous. 
In particular, when v > 0, the operator if is ill-posed with degree v, see 
for instance [26]. 

2.2 Blockwise SVD reconstruction with noisy data 

Under Assumption 2.1 and 2.2, we have the reconstruction formula 

£>0 AGA f 

By the observed blurred version K § of if in (1.2), we obtain a family of 
estimators of (iQ) -1 from data (1.2) by considering the operator 

1 l\uw > hi <r ( 2 - 3 ) 
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where k > is a cutoff level, possibly depending on I. Likewise, the 
coefficient (Kf,g\) can be estimated by 

z n ,x ■= (y n ,g\)- (2.4) 

Mimicking the reconstruction formula (2.2) with the estimates (2.4) and 
(2.3), we finally obtain a (family of) estimator (s) of / by setting 

fn,5= ( K w) *( 2 ^^A 1 / v z 2 > T 2\) 1 ^«(^) 

0<i<L XeAi l^GA, n ,x- tf 

where 

£s,i( K e) = {IK-^) -1 !^-^ < «*}■ 

The procedure is specified by the maximal frequency level L and the 
threshold levels 

Kt =(A |A,|- 1 /2( (5 2| log sl) -V^ n i/ 2 (25) 

and 

r^^olA^I^K'logn) 172 , (2.6) 

for some prefactors Ao,^o > 0. The threshold rule we introduce in both 
the signal (with level t£) and the operator (with level ki) is inspired by 
classical block thresholding [21, 4, 3] and will enable to adapt with respect 
to the smoothness properties of both the signal / and the operator K, 
see below. 



3 Main results 

3.1 Minimax rates of convergence 

We assess the performance of the estimator f n ^ defined in Section 2.2 
over the Sobolev spaces linked to the basis (e\,\ € A) defined in (2.1). 
Define the Sobolev balls W S (M) = {/ G W s , ||/|| w « < M} for M > 
and let 

G u (Q) = {Keg, Qi(K) <Qi,i = 1,2}. (3.1) 

for Q = (Qi,Q2) with Q\ > 0, Q1Q2 > 1, where the mapping constants 
Qi{K) are defined in Assumption 2.2. 
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Theorem 3.1 (Upper bounds). Let Q be a class of operators satisfying 
Assumptions 2.1 and 2.2. Assume we observe (y n ,Ks) given by (1.1) and 
(1.2), with n > 1 and 8 < So < 1. Specify f n ^s with 

L = L( 5 2 ) -i/(2,+d-i) j |y/(2„+d) j (3-2) 

and ki,T£ as in (2.5) and (2.6). For sufficiently small Xq and sufficiently 
large fiQ, for every s,M > 0, Q = (Qi,Q2) with Qi > and suc/i t/ia£ 
Q1Q2 > 1; we /iaue 



/GW s (M),kGa^(Q) 



sup E 



|/n,<5 - /| 



E 



<(^| log(5|) /N /v V ( n lo § n ) ( 3 - 3 ) 

where < means inequality up to a multiplicative constant that depends on 
d,s,u, M, Q and fiQ, Xq only. 

The bounds for fiQ and Ao are explicitly computable. In the model 
generated by y n in (1.1) and Kg in (1.2), they depend on the dimension 
d and on the absolute constants Co and ci of the concentration lemmas 
5.3 and 5.6 below. However, they are in practice much too conservative, 
as is well known in the signal detection case [13] or the classical inverse 
problem case [1], see the numerical implementation Section 4. 

Our next result shows that the rate achieved by f n ^ is indeed optimal, 
up to logarithmic terms. The lower bound in the case 5 = is classical 
(Nussbaum and Pereverzev [26] ) and will not decrease for increasing noise 
levels 5 or n" 1 / 2 whence it suffices to provide the case which formally 
corresponds to observing Kf without noise while K remains unknown. 

Theorem 3.2 (Lower bounds). In the same setting as Theorem 3.1, with 
in addition Q2 > 1/Qi, assume we observe Kf exactly and K5 given by 
(1.2). For sufficiently small 5, we have 

inf sup E [||/- f\\ 2 m ] > (^) 1A2s/(W) (3.4) 

where > means inequality up to a positive multiplicative constant that 
depends on d,s,v,M and Q only. 

Combining (3.3) together with (3.4) and the results of [26], we con- 
clude that f n g is minimax over W S (M) to within logarithmic terms in 
n and 5, and that this result is uniform over the nuisance parameter 
Keg v {Q). 
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3.2 Discussion 

The case of diagonal operators 

It is interesting to notice that the condition Q2 > 1/Qi in Theorem 3.2 
excludes the case where K is diagonal. In this particular case, considered 
especially in the deconvolution example of Section 4.2 below, a closer 
inspection of the proof of the upper bound shows that the rate 

n - s /(2(s+v)+d)y 5 l/\s/u 

can be obtained (up to some extra logarithmic factors) as in the case 
where d = 1, which improves on the rate 

n -s/(2(s+u)+d) W ^1 A 2s/(2v+d-l) 

provided by Theorem 3.1. This sheds some light on the role of the number 
d. It is in fact twofolds: it acts as a 'dimension' in the term n ~ 2s /( 2 ( s + l ')+ d ) 
in the term involving error in the operator 5, it reflects the distance 
to the diagonal case expanding from 5 l f\ s / u in the diagonal case, to 
f\2s/(2v+d-i) m j-he case q 2 > l/Q 1 . It is very plausible, though beyond 
the scope of this paper, to express conditions on K leading to rates of the 
form 2s/ {2v + a), with a continuously varying from to d— 1. Note that 
in the case d = 1, we recover the minimax rate of density deconvolution 
with unknown error as proved by Neumann [24], see also [9]. 



Relation to other works in the case of sparse operators 

For an unknown signal / with smoothness s > and unknown operator 
with degree of ill-posedness v > 0, the optimal rates of convergence are 

up to inessential logarithmic terms. The exponents a(s,i / ) and /3(s,z^) 
are linked respectively to the error in the signal y n and the error in the 
operator K$. Efromovitch and Kolchinskii [14] established that under 
fairly general conditions on the operator K, the optimal exponents are 
given by 

a{s > u) = PM= 2(s + t) + d - 

They noted however that if certain sparsity properties on K are moreover 
assumed (and that we shall not describe here, for instance if K is diagonal 
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in an appropriate basis) then the exponent /3(s, v) = 2 ( s +t)+d ^ s no l° n g er 
optimal, while a(s,u) remains unchanged. 

In the related context of operators acting on Besov spaces Bp p ([0, l] d ) 
of functions with smoothness s measured in L p -norm, Hoffmann and Reifi 
[18] introduce an ad hoc hypothesis on the sparsity of the unknown op- 
erator (that we shall not describe here either), expressed in terms of 
the wavelet discretization of K. They subsequently obtain new rates of 
convergence for a certain nonlinear wavelet procedure, and these rates 
overperform (3.5) as expected from the results by [14]. In particular, if 
one considers the estimation of / £ -Bf 2i m the extreme case where the 
operator K is diagonal in a wavelet basis, the procedure in [18] achieves 
th.6 r<xt c 

n -a(s,u)/2y^lA(s-d/2)/u (36) 

up to extra logarithmic terms. We may compare our results with the rate 
(3.6). In our setting, if we pick (e,\, A £ A) as the Fourier basis described in 
Section 4.2, then we have W s = -Bf 2 ([0, l] d ). Assuming K to be diagonal 
in the basis (e> , A £ A) which is the exact counterpart of the approach 
of Hoffmann and Reifi with K being diagonal in a wavelet basis, then by 
Theorem 3.1, our estimator f n> $ (nearly) achieves the rate 

n -a(s,v)/2\J( § 2^l/\2s/(2u+d-l) 

which already outperforms the rate (3.6) whenever the error in the signal 
y n is dominated by the error in the operator and s is small compared to 
is, as follows from the elementary inequality 

2s/(2f + d- 1) > (s - d/2)/v for 2v + d - 1 > 2s. 

The superiority of the blockwise SVD in this setting is explained by the 
fact that the wavelet procedure in [18] is agnostic to the diagonal structure 
of K in the wavelet basis, in contrast to f ni s that takes full advantage of 
the block structure of K. As already explained in the preceding section, 
one could actually improve further this result in the specific case of K 
being diagonal in (e^,A £ A) and show that f n ^ (nearly) achieves the 
rate n~ a ^ s,v ^ 2 \] (5 2 ) 1 ^ s / u , thus deleting the 'dimensional effect' of d for 
the error in the operator. 

Adaptation over the scales {W s ,s > 0} and {G u ,v > 0} 

The estimator f n g is fully adaptive over the family of Sobolev balls 
{W s (M),s > 0,M > 0} (in the sense that f n< g does not require the 
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knowledge of s nor M). However, the knowledge of the degree of ill— 
posedness v of K is required through the choice of the maximal fre- 
quency L in (3.2). This restriction can actually be relaxed further in 
dimension d > 2. Indeed, setting formally v = in (3.2), one readily 
checks that f n> s becomes adaptive over {W S (M), s > 0,M > 0} and 
{Q v (Q),v > 0, Q = (Qi, Q2)) Q1Q2 > 1} simultaneously. In dimension 
d = 1 however, setting v = in (3.2) is forbidden, but an alterna- 
tive adaptivity result can be obtained by taking L = [(<5 2 )~ 1 / s ° J A n 
for some sq > 0, in which case f n> g is fully adaptive over the scale 
{G u (Q),v > 0,Q = (Qi, Q2), Q1Q2 > 1} and the restricted family 
{W s {M),s > s ,M > 0}. 

Extension to density estimation 

We briefly show a line of proof for extending Theorem 3.1 to the frame- 
work of density estimation. Suppose that instead of y n we observe a ran- 
dom sample Z\ , . . . , Z n drawn from K f assumed to a probability density. 
By analogy to (2.4), we have an estimator of P(<?a) = (Kf,g\) replacing 
(yn,g\) with 

n 

F n (g x ) = n- 1 Y,9x(Z l ). 

i=l 

Writing 

¥ n (g x ) = {Kf,g x )+n^ 2 i ]nA , 

with i] n ^x = n 1 / 2 ( f n (g\) — P(s'a)) ; an inspection of the proof of Theorem 
3.1 reveals that an extension to the density estimation setting carries over 
as soon as the vector (r) n) \,\ £ A^) satisfies a concentration inequality, 
namely 

301 >0,ci >0, V/3>/3 , p(|A A |" 1 ^r ? 2 A ^/3 2 ) < exp ( - c/3 2 |A,|) , 

AeA 

see (5.6) in Lemma 5.2. To that end, we may apply a concentration in- 
equality by Bousquet [2] as developed for instance in Massart [23], Eq 
(5.51) p. 171. The precise control of this extension requires further prop- 
erties on the basis (g\, A G A) and on the density Kf via the behaviour of 
J2\eA e Var(5 A (Zi)), see Eq. (5.52) p. 171 in [23]. We do not pursue that 
here. 
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4 Application to blind deconvolution 



4.1 Spherical deconvolution 

Scientific context. A common challenge in astrophysics is the analysis of 
complex data sets consisting of a number of objects or events such as 
galaxies of a particular type or ultra high energy cosmic rays (UHECR) 
and that are genuinely distributed over the celestial sphere. Such objects 
or events are distributed according to a probability density distribution / 
on the sphere, depending itself on the physics that governs the production 
of these objects or events. For instance, UHECR are particles of unknown 
nature arriving at the earth from apparently random directions of the sky. 
They could originate from long-lived relic particles from the Big Bang. 
Alternatively, they could be generated by the acceleration of standard 
particles, such as protons, in extremely violent astrophysical phenomena. 
They could also originate from Active Galactic Nuclei (AGN), or from 
neutron stars surrounded by extremely high magnetic fields. As a conse- 
quence, in some hypotheses, the underlying probability distribution for 
observed UHECRs would be a finite sum of point-like sources. In other 
hypotheses, the distribution could be uniform, or smooth and correlated 
with the local distribution of matter in the universe. The distribution 
could also be a superposition of the above. Identifying between these hy- 
potheses is of primordial importance for understanding the origin and 
mechanism of production of UHECRs. The observations, denoted by Xi, 
are often perturbated by an experimental noise, say £j, that lead to the 
deconvolution problem described in Section 1.1. Following van Rooj and 
Ruymgart [33], Healy et al. [16], Kim and Koo [22] and Kerkyachar- 
ian et al. [25], we assume the following model: we observe an n-sample 
(Zi,... , Z n ) with 

Z% = i = 1, ... ,71 

where the Xi are distributed on the sphere § 2 , with common density 
/ with respect to the uniform probability distribution fi(duj) on § 2 and 
independent of the £j that have a common density g with respect to 
the Haar probability measure dr on the group SO (3) of 3 x 3 rotation 
matrices. One proves in [16, 22] that the density of the Zi is 

Kf{u)=g*f(u):= [ g(r)f(r~ 1 x)dr,ojeS 2 (4.1) 

and we are interested in the case where the exact form g of the convolution 
operator K = g*» is unknown, due for instance to insufficient knowledge of 
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the device that is used to measure the observations. However, we assume 
approximate knowledge of g through Kg as defined in (1.2). □ 

Checking the blockwise SVD Assumptions 2.1 and 2.1. We closely follow 
the exposition of [16, 22, 25] for an overview of Fourier theory on S 2 and 
SO (3) in order to establish rigorously the connection to Theorem 3.1 and 
3.4. Define 

(cos ip — sin ip 0\ / cos 6 

sin ip cos ip I and a(9) = I 1 
l) \-sin0 

where p> G [0, 2ir), 9 G [0, ir). Every rotation r G SO(3) has representation 
r = u(ip)a(8)u(ip) for some ip, ip 6 [0, 27r),# G [0,vr). Define the rotational 
harmonics 

D l mn (r) = D l mn (^6,ip) = e-^ + ^P l mn (co S (9)) 

for / G N, — I ^ m, n ^ / where P l mn are the second type Legendre 
functions described in details in [34]. The D l mn are the eigenfunctions of 
the Laplace-Beltrami operator on SO (3) hence the family (\/2l + lD l mn ) 
forms a complete orthonormal basis of L 2 (dr) on SO (3), where dr is the 
Haar probability measure. Every h G L 2 (dr) has a rotational Fourier 
transform 

F{h) l m n = / h ( u ) D Ln( u )du, lGN,-K rn, n < Z, 
J5C(3) 

and for every h G L 2 (dr) we have a reconstruction formula 

ZGN -l<m,n<l 

= ^ F{h) l mn D l mn (m~ l ) 

Z€N -l<m,n<l 

An analogous analysis is available on 8 2 . Any point oj G §> 2 is determined 
by its spherical coordinates u) = ( sin(#) cos((/?), sin(#) sin(<^), cos(#)) for 
some 8 G [0, 7r),<£> G [0, 27r). Define 

= = (-i)"V^iwr p -( cos ^) e ^ ( 4 - 2 ) 

for Z G N, — Z ^ m ^ / where are the Legendre functions. We have 
Yi m = (— l) m Y^ and the (Y^J constitute an orthonormal basis of L 2 (/j,) 
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on § 2 , generally referred to as the spherical harmonic basis. Any / £ L 2 (/j,) 
has a spherical Fourier transform 




and a reconstruction formula 



/ = £ £ Hf) l m Yl 



i&n -l<m<l 




H9*f) l m= £ Hd) l mnHf) l n (4-3) 



n=— Z 



and we retrieve the blockwise SVD formalism of Section 2.1 in dimension 
d = 2 by setting M = G = L 2 (§ 2 ,fi), where [i the probability Haar 



We have |A^| = 11 + 1 and by (4.3), Kg is the finite dimension operator 
stable on Spanje^, A 6 A^} with matrix having entries 



Hence Assumption 2.1 is satisfied. Notice also that in this case Kg is 
generally not diagonal. Assumption 2.2 is satisfied as we assume that g is 
ordinary smooth in the terminology of Kim and Koo [22]. Our Assumption 
2.2 exactly matches the constraint (3.6) in their paper with examples 
given by the Laplace distribution on the sphere {y = 2) or the Rosenthal 
distribution (v > arbitrary). □ 

Numerical implementation. Following Kerkyacharian, Pham Ngoc and 
Picard [25] in their Example 2, we take /(lo) = Cexp(— 4\\lo — i^i|| 2 ) 
with wi = (0, 1,0) and C = 1/0.7854. We have \\f\\ L 2 M = 0.7469. 
g is the density of a Laplace distribution on SO (3), defined through 
•F(g)mn = ^mn(l + ^ + 1)) • Hence, the matrices (Kc) mn are homoth- 
eties whose ratios behave as £~ 2 . We have v = 2. 

We plot in Figures 1 a 1000-sample of Xi with density / on the sphere, 
and the action by e% on the Xi, where the £j are distributed according to 
g in Figure 2. Note that for the estimation of g, we have access to a noisy 
version of g with noise level 5 only. 



measure on S 2 and 



e\ = g\ = Y m with A 



(m,£), A e = {(m,£), -i^m^l}. 



Ha) 
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Figure 2: Data from f*g. Plot of n = 1000 data EiXi on the sphere § with common 
distribution K f = f -k g. The Xi are the data pictured in Figure 1 and the are 
sampled according to g (planar representation). 
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Figure 3: Target density /. The representation is simplified through a view from 
above the sphere through the Oz-axis. 



We display below the (renormalised) empirical squared error of /ios^ 
(oracle choice Ao = 1,/io = 1) for 1000 Monte-Carlo for several values of 
5. The noise level 6 is to to be compared with the noise level n" 1 / 2 = 10~ 4 . 
The latter is chosen non-negative, in order to show the interaction between 
the two types of error, and sufficiently small to emphasize the influence 
of 5 on the process of estimation. 



Noise level 6 





10~ 3 


310~ 3 


510~ 3 


10" 2 


Mean error 


0.0466 


0.0542 


0.1732 


0.2784 


0.4335 


Standard dev. 


0.0011 


0.0022 


0.0126 


0.0355 


0.0466 



Finally, on a specific sample of n = 10 8 data, we plot the target 
density / (Figure 3) and its reconstruction for n = 10 8 data with 5 = 
(Figure 4) and 5 = 3 10~ 3 (Figure 5). At a visual level, we oversimplify 
the representation by plotting / and its reconstruction with a view from 
above the sphere through the Oz axis. We see that the contour in Figure 
5 is not well recovered in the regions where / is small (on the right side 
of the graph in Figure 5). The choice of Aq,//o remains unchanged. □ 
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Figure 5: Reconstruction for n = 10 s and 5 = 310 -3 . The reconstruction is 
polluted simultaneously by the limited number of observations n and the noise level <5 
in the blurring g. 
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4.2 Circular de-convolution 



Scientific context. In many engineering problems, the observation of a 
signal / or image is distorted by the action of a linear operator K. We as- 
sume for simplicity that / lives on the torus T = [0, 1] (or [0, l] d ) appended 
with periodic boundary conditions. In many instances, the restoration of 
/ from the noisy observation of Kf is challenged by the additional un- 
certainty about the operator K. This is the case for instance in electronic 
microscopy [27] for the restoration of fluorescence Confocal Laser Scan- 
ning Microscope (CLSM) images. In other words, the quality of the image 
suffers from two physical limitations: error measurements or limited ac- 
curacy, and the fact that the exact PSF (the incoherent point spread 
function) that accounts for the blurring of / (mathematically the action 
of K) is not precisely known. This is a classical issue that goes back to 
[31, 15]. An idealised additive Gaussian model for the noise contamination 
yields the observation (1.1) with 

Kf(x)=g*f(x):= [ g(u)f(x-u)du, xeV = T d . 

The degradation process K = g*» is characterised by the impulse response 
function g. In most cases of interest, we do not know the exact form of g. 
In a condensed idealised statistical setup, we have access to 

g 5 = g + SW', (4.4) 

where W' is another Gaussian white noise defined on L 2 (/jl) and indepen- 
dent of W. Experimental approaches that justify representation (4.4) are 
described in [10, 20, 30]. □ 

Checking Assumptions 2.1 and 2.1. We obviously have H = G = L 2 (T d ) 
and the bases (e\) and (g\) will coincide with the d-dimensional extension 
of the circular trigonometric basis ( e 2lnkx ^ k € Z) if we set: 

d 

e A (xi, . . . , x d ) = J] e 2mk ^, ( Xl ,...,x d )e T d , 
j'=i 

where we put 

d 

X = (ki,...,k d ), ^ = |A| = 1 + ^N> andf>l. 
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Any / £ L 2 (fi) has a Fourier transform F{f)\ = f Td f(x)e\(x)[i(dx) and 
moreover, if g £ L 2 (fi), we have 

Hf*9h=J c x(f)J c x(g)- 

Therefore, K is diagonal in the basis {e\, A £ A) henceforth stable. More- 
over, with A e = {A, |A| = I}, we have |A^| = (*~^]_*J ~ More- 
over 7^ = Diag(j-A(5), A £ Af) an d Assumption 2.1 follows. Assuming 
that g satisfies c|A| _t/ < |J-"(<?)a| < c' | A | ^ for some v > and constants 
c,c' > 0, we readily obtain Assumption 2.2. Note also that since K is 
diagonal in the basis (e^,A £ A) observing gs in the representation (4.4) 
is equivalent to observing K$ in (1.2). □ 

Numerical implementation. We numerically implement f n ^ in dimension 
d = 1 in the case where there is no noise in the signal (formally n" 1 / 2 = 0) 
in order to illustrate the parametric effect that dominates in the optimal 
rate of convergence in Theorems 3.1 and 3.4 that becomes (fi 2 y/ uA1 [ n 
that case. We take as target function / : T — > R belonging to W 5_ct for 
all q > 1/2 and defined by its Fourier coefficients 

^(/) A = |Ar 5 , A G{-1000,..., 1000}. 

We pick a family of blurring functions g v defined in the same manner by 
the formula 

T{g v ) x = \\\-\ A £{-1000,..., 1000}, v £ {1, 4, 5, 6, 8}. 

We show in Figure 6 in a log-log plot the mean-squared error of s f° r 
the oracle choice [J,q = 0, Ao = 1 over 1000 Monte-Carlo simulations for 
v £ {1,4, 5, 6, 8} and 5 £ [10~ 4 , 10 -1 ]. For small values of 5 the predicted 
slope of the curve gives a rough estimate of the rate of convergence. 
We visually see that for the critical case v < s = 5 — a with a > 1/2 
and below, the slope is close to 2 confirming the parametric rate that is 
obtained whenever v < s. □ 



5 Proofs 

5.1 Preliminary estimates 

Preparation. Recall that Hi = Span{eA,A £ A^}, Gi = Sp&n{g\, \ £ A^} 
and that Pi (resp. Qi) denotes the orthogonal projector onto Gi (resp. 
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Figure 6: Estimation of the rate exponent when n~ 1//2 <C S. Empirical squared- 
error E versus 8 in log-log scale. Top-to-bottom: v = 8,6,5,4, 1. The target function 
has smoothness s = 5 — a for all a > 1/2. For v < 4.5, the slope of the curve is 
constant and close to 2, confirming the parametric rate predicted by the theory when 
the smoothess of the signal dominates the degree of ill-posedness of the operator. The 
empirical errors were computed using 1000 Monte-Carlo simulations. 

Hi). For h G H, we have 

P e Kh = P e KQ e h + P e K(ld - Q t )h. 

Using Assumption 2.1, we have if (Id— Qe)h G Gf and therefore P(K (Id— 
Qe)h = 0. As a consequence 

PlK = KiQi. (5.1) 

In turn, we have a convenient description of the observation K$ defined 
in (1.2) and y n defined in (1.1) and in terms of a sequence space model 
that we shall now describe. □ 

Notation. If h G G, we denote by he the (column) vector of coordinates 
of Peh in the basis (g\,X G A^). If T : H — > G is a linear operator, we 
write Ti for the matrix of the Galerkin projection Ti = P{T\n l of T. □ 

Sequence model for error in the operator. The observation of K$ in (1.2) 
leads to the representation Kgg = Kg + d~P>£, or equivalently, in matrix 
notation 

K S j = K e + 5B e , £>1, (5.2) 

where Bg is a |A^| x |A^| matrix with entries that are independent centred 
Gaussian random variables, with unit variance. The following estimate is 
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a classical concentration property of random matrices. For I < L, ||»|| p 
denotes the operator norm for |A^| x |A^| matrices (we shall skip the 
dependence upon I in the notation). 

Lemma 5.1 ([11], Theorem II. 4). There are positive constants /3q, cq such 
that 

For all /3>p , P {\K,\- l l 2 \\B t \\ op > 0) < exp ( - c /3 2 | A e \ 2 ) . (5.3) 

An immediate consequence of Lemma 5.1 is the following moment 
bound: 

For every p > 0, E [||B/||gp] < \A e \ p/2 . (5.4) 

□ 

Sequence model for error in the signal. From (1.1), we observe the Gaus- 
sian measure y n , or equivalently, thanks to (5.1) 

P t y n = P e Kf + n- l / 2 P t W = K e Q e f + n" 1 / 2 ^, £ > 1 

or, using the notation introduced in (2.4), in matrix notation 

z n4 = KJ e + n-V 2 m , £ > 1 (5.5) 

where we used (5.1), with rj e denoting a vector of | A^| independent centred 
Gaussian random variables with unit variance. 

The following result is a direct consequence of the fact that has a 
X-square distribution with |A^| degrees of freedom. The proof is standard 

Lemma 5.2. There are positive constant (3i,c\ such that 

For all p > fa, P (|A^|- 1/2 ||^|| > 0) < exp ( - ci/3 2 |A £ |), (5.6) 

□ 

5.2 Proof of Theorem 3.1 

We have 



Wfn — /||h — ^2 \\fn,e ~ fA\ 2 — Wfn,l ~ f l\\ 2 + 



|2 
\J l\\ 

e>i i=i t>L 
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where |»| denotes the Euclidean norm on (we shall omit any refer- 
ence to I when no confusion is possible). Concerning the bias term, we 
have 

Era 2 <H/Hw^- 2s (5-7) 

and this term has the right order by definition of L in (3.2). Concerning 
the stochastic term, thanks to our preliminary analysis, we may write 

fn,£ = ( K S,d~ lz nA{\\{K Sj £)-^ ov < Kl }^\\z nA \\>T t }, 

We set 

At = {IK-K^r^lop < «*} and &£ = {\\ z n,e\\ > n}. 
We thus obtain the decomposion of the variance term as 

L 

Y,\\fn,e-fe\\ 2 <I + II + H^ 



1=1 



with 



I = J2\\( K s,e)~ 1 Zn,e- fifUAt 

L 

n = j2\\ft\\ 2 u t , 

e=i 

L 

m = ^||/,fii 



i=l 

We shall successively bound each term I, II and III. 

• The term I, preliminary decomposition. Writing 

z n/ = (K s ,e - 5B e )f e + n~ 1/2 r) e , 

we obtain 

{K S t)- x z nJl -f e = -8{K S4 )- X & t f t + n-^iKsj)- 1 ^. 

We introduce further the event {||(5S^|| op < af\ with ag = for some 
< p < \ and the condition > ^}. We thus have 

I < IV + V + VI + VII, 
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with 

L 



iv = g \\s(K s ,^B efe fi AenBe i {¥At ^ } i {mft ^y 

£=1 L J L z J 

VI = £ ll^)^^/.l| 2 Un 6£ (l {|1 ^ ||op>aJ + 1 { , 

£=1 



We shall next successively bound each term IV, V, VI and VII □ 

• The term IV. First, we have 

{K.y 1 = (K s/ - 5B,)- 1 

ln\-l/ r/ \-l 



(I-6Kj'B)- L (K 



si) 



On Ae = 1 ||op < K>e} and {||&B^|| op < a^}, since satisfies 

Kg ag = p < |, by a usual Neumann series argument, 



|| (I - 5(K s/)^ 1 B) 1 \\ ov = \\Y J {-K s/ ) l {5BY\\ op 

i>0 

- XT H-^'VllopP-^llop 

i>0 

i>0 

Therefore, on ^ and {||<5B^|| op < a^}, we have 

II (k^)- 1 || op < (i - P y l \\(K s/ y l \\ op < (i - P )-V (5.8) 

Second, we now write 

(K 5/ y i = (i-(K e r i 5B e y\K e r i , 

hence, on At and {||<5S^|| op < ai}, we have by (5.8) 

IKJQr^Hop < (i - P y x w < j?— < i 
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since p < ^ by assumption. The same Neumann series argument now 
entails 

KKs^W^Kj^WiKi)- 1 ]^. (5.9) 
We are ready to bound the term IV itself. We have 

~ ll( ^ rl|l °P^ 2|l ^ l|2l {||(^)- 1 ||op<(l-p)-^,}' 

where we successively used (5.8) and (5.9). It follows that 

E [IV] < £ KKeT^lW \ UKi) - Hop<(1 - p) -. Ke} 

<£^v 2 ii/j 2 



where we used Assumption 2.2. The bound is uniform in K € Q U {Q)- By 
definition of K£ and using that |A^| is of order £ d ~ l , we derive 



L 

2 



nm< ((^iiog^DVn- 1 )^^- 1 !!/ 

If 2r/ + d — 1 < 2s, we have 

Y.^ +d ' 1 \\fit< ii/iiv, 

therefore 

E[/F] <5 2 |log,5|+L- 2s 

< ( ( 5 2 |log*|) lA2 ' /(2, ' +d - 1) \/n- 2 ' /(2,/+<0 (5-10) 

by definition of L in (3.2), and this result is uniform in / £ W S (M). If 
2v + d — \ > 2s, we have 

^^ + rf-l|| /£ ||2< jL 2(,- S )+d-l^ £ 2 S ||^||2 
£=1 £=1 
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By definition of L again we derive 

E[IV] <L- 2s L 2u+d - 1 {rT 1 \l 8 2 \\og5\) 

< L -2s (n -l/(2„+d) L -2s (5 n) 

and this bound is uniform in / € W S (M). Putting together (5.10) and 
(5.11), we finally obtain 

E[IV] < («5 2 |log«5|) lA2s/(2i/+<i - 1) \J n -W«+*> (5.12) 

uniformly in / € W S {M), K e Q U {Q). □ 

• The term V. We have 

» ii%i|2i ^ 1 {ii«a < iu<«i} 1 {iik:*/ 1 ii>^} 

-{ll^/dl>f} 



where we successively used (5.8) and (5.9) in the same way as for the 
term IV, the last inequality being obtained thanks to Assumption 2.2. 
By Assumption 2.2 again, since 

\\Kif t \\ < ||K*|| op ||^||<Qi(#Kl/J 

we derive 

1 {\\K i f i \\>^} " 1 {\\fA>Qi(K)~ 1 ^} = 1 {||/ £ ||>rf"+( d - 1 )/ 2 n-i/2(logn)i/2} 

for some constant c that depends on Q\{K) and the pre- factor fiQ in the 
choice of Te only. Since E[||t7£|| 2 ] = |A^| < £ d ~ l , we infer, for any 1 < k < L 

L 



W) <n l Yf V+d ll {|| /J > c r + ( d - 1 )/ 2 n- 1 /2(l og n) 1 /2} 

k L 

<n-i(J2 fU+d ~ 1 + E n(logn)-i\\f e \\ 2 ) 

1=1 l=k+l 
< n -l fc 2,+d +(logn) -l^||^||2 



t>k 

2s 
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The admissible choice k = L(n(logn)- 1 /2) 1 /( 2 ( s+ ^+ d )j A (^-l/^+d-i) 
yields 



E[V] < n~ l k u+d + k- 2s 

< (n~ l logn) 2s/(2{s+,/)+d) \y ((5 2 )W(2^-i) ( 513) 

uniformly in / G W S (M), K G Q V {Q). □ 

• The term VI. We further bound the term VI via 

VI < VIII + IX, 

with 

VIII = jz¥{K 5/ )^B dt \\H Al l [¥M y 
1=1 L ' 

L 

I « 2 / 

On ^l^, we have 

MKs^Bd.w 2 <s 2 Kj\\B e \\ 2 op \\f e \\ 2 

hence 

e[^]<^ 2 E^ii/j 2 ie[ii^ii 2 p i {|| ^ } ] 

o 1 ^ J 

|4 ll/2 ro /,Uoj| ^„^l/2 



£=1 
L 

< 5 2 5> 2 ||/J 2 E [||S,||y 1/2 F(||^|| op > a,) : 

< ^E^ii^n 2 i A ^ cop2|A ' |2/2A§ 



<\\o g 5\ra^ L 5^ 2 \^ 2 / 2X Hf\\l 

applying successively Cauchy-Schwarz, the moment bound (5.4) and 
Lemma 5.1. Indeed, since an = p/ki, by definition of K£ in (2.5), we 
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infer 

P (lollop > a t ) < P (lAtl-WWBeWap > lA^-^^r 1 ) 

= P(|A,r 1 /2|| j B,|| op> ^|i g ( 5| 1 / 2 ) 

< exp ( - c 4| log5||A £ | 2 ) = $ c oP 2 \A e \ 2 /xz (514) 
o 

by (5.6) of Lemma 5.2 since -^jlog^l 1 / 2 > (3q for sufficiently small Ao 
thanks to the assumption 5 < 5q < 1. Finally, since At is non-empty, by 
taking Ao sufficiently small, we conclude 

E [VIII] < S 2 (5.15) 

uniformly in / £ W S (M). We now turn to the term IX. Observe first 
that 

1b *V^ii<^}- V i/2 imi^}- (5 - 16) 

We reproduce the steps we used for the term VIII, replacing the event 
{(lollop > a e } by {n- l / 2 \\rf t \\ > We obtain 

E [IX] < (5 2 ^^||/^|| 2 |A^|P(n~ 1/2 ||r/ £ || > ^) 1/2 . 

By definition of Ti in (2.6) and Lemma 5.6, we have 

P(™~ 1/2 |MI > f) = P(N~ 1/2 |M > f (logn) 1 / 2 ) 

< exp ( - ci ^ log n) = n- Cl ^ /4 (5.17) 

since ^-(logra) 1 / 2 > j3\ for large enough /io- It follows that 

E[lX] <|log<5|||/||^n- c ^o/4< n -i|l og( 5| ( 5.i 8 ) 

by taking /io sufficiently large. The bound is uniform in / £ W S (M). 
Putting together the estimates (5.15) and (5.18), we derive 

E[VI] <5 2 + n- 1 \log5\ (5.19) 

for large enough n, uniformly in / € W S (M). □ 

• The term VII. . The arguments needed here are quite similar to those 
we used for the term VI. On At, we have 



-^(Ks^vtW 2 < n-^jw^w 2 , 
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hence, using (5.16), the fact that E [\\r] e || 2 ] = |A^| < i d 1 together with 
K e < n 1 ' 2 by definition (2.5), we successively obtain 

L 

E[VII] <n- 1 J3«?E[||» 7< f](p (lollop >a<) +P(n- 1 / 2 ||r/,|| > f 



< max { P (H^Hop > a e ) + P (n^H^H > a)} 
t=\ 

< L d_1 (5 cop2/A o + n" Cl/1 o /4 ) 



where we applied (5.14) and (5.17) to obtain the last inequality. The 
choice of L in (3.2) leads to 



E[VII]<(5 2 ) 2 " +d - 1 ~^ +n^i-— <5 2 \l n- 1 (5.20) 
by taking Ao sufficiently small and fiQ sufficiently large. □ 

• The term I, conclusion. We put together the estimates (5.12), (5.13), 
(5.19) and (5.20). We obtain 

E[7] < (^llog^D^^^^VK'logn) 2 ^ 2 ^^ (5.21) 
uniformly in / € W S {M). □ 

• The term, II. We claim the following inequality 

lAc ^ 1 {||(j i r,)-x|| op >^} + 1 {\\ Ks ,z-KAW>^y (5 ' 22) 
a consequence of the following elementary lemma 

Lemma 5.3. Let A and B be two bounded operators with bounded inverse. 
J/H-B" 1 !! > k for some k > 0, then either 1 1 ^4 1 1 1 > k/2 or \\A—B\\ > 1/k. 

Proof of Lemma 5.3. Write B = A+£. Assume that || A~ l \\ < k/2. By the 
triangle inequality, ||(^4+^)~ 1 — A~ 1 \\ > k/2. We proceed by contradiction: 
suppose that ||f|| < 1/k. Then we have < ||A -1 ||||f|| < 1/2 < 1 
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and a standard Neumann series argument entails 

\\(A + C)" 1 - A^W =\\(I + A- 1 ^)- 1 ^- 1 - A~ l \ 
=|| ^(-l)*^" 1 )^ 



< 



i 

i>l 
i>l 

<2Ey u =2' 

i>l 

a contradiction. □ 
By Assumption (2.2), we have IK-fiQ) -1 !^ < Q 2 (K)f u . Therefore 

1 {||(^)-i||o P >f } " 1 {/> c (6i»|log*|) 1/(a, ' +d - 1) AnV(^} 

for some constant c that depends on Q 2 (K) and Ao only. For the second 
term in the right-hand side of (5.22), we apply by Lemma 5.1 in the same 
way as we obtained (5.15) for the term VIII. We derive 

P (lollop > kJ 1 ) =P(|A^- 1 / 2 ||^|| op > ^ l \\og5\ 1 ' 2 ) 

<exp(-gHlog5||A n | 2 ) =S C0 ^ 2 ^ <<J<Ww 

for large enough ^o- Therefore 

< („-/" V (5 2 \ log 5\f s/ ^ +d ~ l) ) \\ff ws + ||/||^°. 

We finally obtain 

E [//] < (5 2 logS- 1 ) 28 ^^ + 5 2 + 

< (5 2 1 log 5\) 1 A 2s ^ 2v+d -^ \J (n- 1 log n) 2s ^ s +^ (5 .23) 

uniformly in / G W S (M), K G 0"(Q). □ 
• TTie term Obviously, the decomposition (5.5) entails 

lBC = 1 {\\K e f e +n~^ Ve \\<r l } ~ 1 {\\K l f e \\<2r l } + {„-l/2||^|| >7l ). ' 
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On the one hand, we have 

\\K t f t \\ > KKtr^WftW > Q 2 {K)-H-»\\f t \\ 

by Assumption 2.2. By definition of ti in (2.6) it follows that, for any 
1 < k < L, 



E II^H 2l {||K f /,||<2r f } ^ E H^ll 21 !!!/,!!^^)-!^} 
1=1 L J 1=1 1 ' 



t=i e=k+i 

2s 



^(n-Mogn)^ 2 ^ 1 + \\f\\ 2 w ,k 
e=i 

<(n- 1 Iogn)A; 2 ^ d +||/||^fc- 2 '. 
The choice k = L(n 1/2 (log n)' 1 ' 2 ) V( 2 (s+")+<*)j yields 



Eii/iiiV^}-^" 1 ^"^^^ (5 ' 24) 



uniformly in / € W S (M),K E ^(Q). On the other hand, by (5.17), we 
have ^ 

J2\\fiW 2 ^( n ' 1/2 Wve\\ >n}) < ll/ll^- c ^o/4 < n -i 
i=\ 

by taking fj,Q large enough, uniformly in / E W S (M). Combining this last 
estimate with (5.24) we infer 

E [///] < (n- 1 logn) 2s/(2(s+i/)+d) + n- 1 (5.25) 
uniformly in / E W S (M), K E Q U {Q). □ 

Proof of Theorem 3.1, completion. It remains to piece together the esti- 
mates (5.7), (5.21), (5.23) and (5.25). □ 
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5.3 Proof of Theorem 3.2 
Preliminaries: a Bayesian inequality 

For every I > 1, denote by the set of |A^| x |A^| matrices. We denote 
by M v t (Q) the subset of M.i of matrices Ki such that 

\\K l \\ ov <Q 2 r v and ||(K^) _1 || p < Qit. 

Define 

K\ = c x l~ v It (5.26) 
where I? denotes the identity in Aig and c\ > is such that 

1/Qi <c 1 <Q 2 

so that K.\ £ A^^(<5). We assume a Bayesian approach and pick Kg at 
random, with 

2^ = 1^ + 02 SW e , 

for some C2 > and where We is an independent copy of Bg. Define 
Qg = (1 ... 0) T as the first canonical (column) vector in PJ A£ L Define 
also 

V = -(K° £ )-HK e - K^K^g, (5.27) 

and 

X = -(K°)-\K 5 ,i - K,°)(K°rV (5.28) 

Lemma 5.4. There exists a constant C3 depending on v,Q and C2 only 
such that 

inf P {8- 2 r Av \K l \- 1 \\T(X) - tf|| 2 > c 3 ) > ±, (5.29) 

where the infimum is taken among all estimators T based on the observa- 
tion X . 

Proof of Lemma 5.4- We have X = ■& + £, with 

= -(K o £ )- 1 c 2 6W(K° £ )- 1 g e and e = -(K^dBiK^g,. 

By construction, # and e are two independent Gaussian random vectors. 
More precisely, by definition of g^ and with obvious notation, we have 

■d ~ Af(0, 5 2 c 2 2 c^£ 4u Ig) and e ~ A/"(0, 5 2 q 4 ^ 4iy i>) . 
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It readily follows that the posterior law of given X is 



£(0\X) =N(-^ 1 X 1 $ 




2 c 2 




Now, for C3 > 0, define 



H s (c 3 ,x) = 1 



{SrH 



■e\ 1 \\x\\ 2 > c 3 } 



for x G M |A ^ . 



Setting z(X) = T(X) - E[0 | X], we have 

E [ff 5 (c 3 , T(X) -0)\X] = E [H a (c 3 , z(X) + E[0 I X] - 0) \ X] 



where we used a version of Anderson's Lemma given in Lemma 10.2 in 
[19] p. 157. Indeed, the law of E[i9 | X] — -d has a centrally symmetric 
density and the function H$ is nonnegative, centrally symmetric, satisfies 
Hg(0) = and the sets {x, Hs(c 3 , x) < c} are convex for any c > 0. 

Now, || Eft? I X] — 0\\ 2 has a ^-distribution with |A^| degrees of free- 
dom, up to a scaling factor of order 5 2 l iv '. This means that the sequence 
of random variables (5~ 2 £ _4ly |A^|~ 1 || E[# | X] — 0\\ 2 is bounded below in 
probability in £ > 1 and 5 > 0. Since E[t? | X] — $ is moreover independent 
of X, it follows that there exists C3 independent of 5 and i such that 



Integrating with respect to X, we obtain (5.29) and the result follows. □ 
Proof of Theorem 3.2 

We assume with no loss of generality that 2u + d— 1 > 2s. (Otherwise, the 
lower bound 5 trivially follows from the parametric case.) Let H S,U (M, Q\) 
denote the set of sequences ir = (7Te)e>i satisfying 



For 7r G U s ' u (M,Qi) and K G Q U {Q), define / via its coordinates in H e 



>E [#«(<*, E[0 I X] -0) |X] 



E[H s (c 3 ,E[0\X]-0)\X] > \. 




(5.30) 



by 



f e = n e K £ g e , £>\ 
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where gg is an arbitrary vector in with ||g^|| = 1 (fixed in the sequel). 
Then 

e>i e>i e>i 

since vr G W' v (M,Qi). Therefore / G W S (M). It follows that for an 
arbitrary estimator /, we have 

sup e[||/-/|| 

feW s (M),KeS u (Q) L 

sup 5Z E f||?<-/^ 

> sup VE \\\f e - 'K t K~ x g l \ 

7Ten^(M,Qi),KeG"(Q)'^[ 1 

Lemma 5.5. There exist a choice of gg with \\gg\\ = 1 and constants 
C4,c§ (depending on s,is, M,Q) such that for any tt G H s,v (M, Qi), if 
\kt\ 1 l 2 5 < c A l~ u , we have 

inf sup EfH/^-TT^- 1 ^!! 2 ] >c h 5 2 & +d - l -4 (5.31) 
ft KeG"(Q) 

where the infimum is taken over all estimators and provided 5 > is 
sufficiently small. 

With (5.31), we easily conclude: Define L = [c % 5~ 2 ^ 2v+d ~^ \ with 
cq > 0. For 1 < £ < L, the assumption | A. | x / 2 (5 < c/±i~ v of Lemma 5.5 is 
satisfied by picking cq > sufficiently small and we have 

sup VE \WJg- ■K t KJ 1 g t \ 

nen^(M,Q 1 ),Keg"(Q)' e ^[ L 

> C5 5 2 sup yy^-vf 

7r6n^"(M,Qi) e=1 

^ „ x2A/ 2 T 2u+d-l-2s ^ „ 2v+d-l-2sM 2 c2s / (2v+d-l) 
>C 5 -^tL > c 5 c 6 -grd 

thanks to the admissible choice tt specified by tt 2 = £~ 2( - l,+s > M 2 / Q 2 if 
I = L and otherwise. Theorem 3.2 follows. It remains to prove Lemma 
5.5. 
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Proof of Lemma 5.5. In view of (5.31), we may (and will) assume that 
7T£ = 1. We rely on the notation and definition of the preliminaries. Ob- 
serve first that 

inf sup E[\\f e -K^g e \\ 2 ] 

fe KeG"(Q) 

= inf sup ElWft-^-iK^-^gtf]. 
fe KeG»(Q) 

where K° is defined in (5.26). Put vgg = 5 2 (. 4u+d ~ 1 . For any c > 0, by 
Chebyshev inequality, we have 

cV'mf sup EOI/,-^- 1 -^)- 1 )^!! 2 ] 
' f t Keg»(Q) 

>inf sup P (\\fi — {KJ 1 — (K Q t Y l )g(\\ > cvs.e). (5.32) 
fe KeQ»{Q) 

We adopt the same Bayesian approach as in the preliminaries and consider 
Kg as a random matrix with distribution such that 

K e = K° e +c 2 5W e , (5.33) 

where W~i is an independent copy of Bi and C2 > is to be specified later. 
Using the randomisation (5.33) on Ki, the right-hand side in (5.32) is now 
bigger than 

inf P (||/, - (KJ 1 - (K^-^gzW > cv s/ ) -V{K t i M v t (Q)). (5.34) 
fe 



Let us first show that 



inf P (||/, - {KJ L - (K»r L )g e \\ > cv s , e ) (5.35) 
fe 

is bounded below for an appropriate choice of c > 0. Introduce the event 

As = {Qitc 2 5\\W e \\ op < p) 

for some < p < 1. Observe that || (K®)~ 1 C26We\\ op < p on As, therefore, 
by an usual Neuman series argument, we have the decomposition 

= - (Kft-^czSWtXK})- 1 + ^(-ir((^)- 1 C2^)"(ifS)- 1 

n>2 
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Applying the vector g £ = (1, 0, . . . , 0) and setting 

n>2 

we obtain the decomposition 

= # + Cs,e, 

where i9 is defined in (5.27). We derive, for any c > 

> P(||? € -(0 + C^)ll >cv s ,t and As) 

> ¥(\\f e -'&\\ > \cv 5t i and As and \\Cs,e\\ < \cvg,i) 

by the triangle inequality. We claim that for any e > 0, there exists a 
choice of sufficiently small C2 such that for any c > 0: 

lim sup P (.A^ and \\Cse\\ — \ cv s,i) > 1 — £• (5.36) 

Let us admit temporarily (5.36). For such a choice, we thus have 

ViWft-iKji-iKtr^gtW^cvv) 
>P(||?^-1?|| > \cv s ,t)-e. 

Let us now look at an apparently different problem: we want to estimate 
■ft from our observation Kg£, or equivalently, from the observation 

The choice g t = (1,0, . . . , 0) T entails that -(J^) _1 (i^-if °) _1 ff^ 
is a sufficient statistic, but this last quantity is precisely X defined in 
(5.28). Thus, without loss of generality, f$ can be taken as an estimator 
of the form T(X). By Lemma 5.4, we know that vg£ is a lower bound for 
estimating ■d. 

More specifically, by taking c such that c ^ 2y^C3, we have 
F(\\f e -#\\ > lcv 5 ,t)-e>\-e>\ 

say, since the choice of e is arbitrary, and (5.35) follows. It remains to 
prove (5.36). 
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First, we have that |A^| ^HW^Hop is bounded in probability by 
Lemma 5.1 in i > 1. Since |A^| 1//2 <5 < c^t v by assumption, we also have 
that r5\\W e ||op is bounded in probability, hence the probability of As can 
be taken arbitrarily close to 1 by taking c 2 sufficiently small. Moreover, 
on As, we have 

WCsA <Qi^^2(Q^ u c2S\\Wi\\ op ) n 

n>2 

<{l- P r l clQl5 2 e v \\W t \\l p 

< (i - py 1 ^ Qise^K^i^^r^w 

where we again used the fact that lA^I 1 / 2 ^ < c^t~ v by assumption. The 
claim follows from the fact that lA^j" 1 / 2 !! W^|| op is bounded in probability. 
Hence (5.36) and (5.35) is proved. 

In order to complete the proof of Lemma 5.5, we need to check that 
the term P [Kg ^ Ai^(Q)) can be taken arbitrarily small when bounding 
(5.32) below by (5.34). We have 

V(K t tM v t {Q)) 
<P (lollop > Q 2 r u ) +P(||K7 1 || op > Qtt). (5.37) 

For the first term in the right-hand side of (5.37), we have 

P (lollop > QtT") < IP {\\c 2 5Wi\\ op > Q 2 r v - \\K° e \\ op ) 

< P (||c2^||op > (Q2-C 1 )£- U ). 

The last term can be rewritten as 

F(\A e \- 1 / 2 \\W e \\ op >(Q 2 -c 1 )c^ 1 £^\A e r 1 / 2 5~ 1 ). 

For the second term in the right-hand side of (5.37), thanks to the prop- 
erty H-fQ^Hop ^ (c\£~ v — \c 2 b\V ^|| p) we derive 

p(l|K7 1 l| op >Q 1 r) 
^p(|A^|- 1 / 2 ||w'^| p>( Cl -Q7 1 K 1 rHA^|- 1 / 2 r 1 ). 

By assumption, we have that l~ v \ A^l -1 / 2 ^ 1 is bounded away from zero. 
Since |A£| _1 / 2 ||Vi^||op is tight in I > 1, we can conclude by taking c 2 
sufficiently small. The proof of Lemma 5.5 is complete. 

□ 
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